AITopics | crowdsourced data

Collaborating Authors

crowdsourced data

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ce9e92e3de2372a4b93353eb7f3dc0bd-Supplemental-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsFeb-19-2026, 12:00:35 GMT

crowdsourced data, dataset, pipeline, (14 more...)

Neural Information Processing Systems

Country:

Africa > Niger (0.07)
Europe > Germany > Saxony > Leipzig (0.04)
Asia > Vietnam (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Communications > Social Media > Crowdsourcing (0.31)

Add feedback

Appendix

Neural Information Processing SystemsAug-19-2025, 00:51:29 GMT

Figure 9: Example showing how a single line of HTML code is rendered by a browser's renderer.

artificial intelligence, natural language, social media, (17 more...)

Neural Information Processing Systems

Country:

Africa > Niger (0.07)
Europe > Germany > Saxony > Leipzig (0.04)
Asia > Vietnam (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Communications > Social Media > Crowdsourcing (0.31)

Add feedback

A New Lens on Homelessness: Daily Tent Monitoring with 311 Calls and Street Images

Jung, Wooyong, Kim, Sola, Kim, Dongwook, Tabar, Maryam, Lee, Dongwon

arXiv.org Artificial IntelligenceAug-12-2025

Homelessness in the United States has surged to levels unseen since the Great Depression. However, existing methods for monitoring it, such as point-in-time (PIT) counts, have limitations in terms of frequency, consistency, and spatial detail. This study proposes a new approach using publicly available, crowdsourced data, specifically 311 Service Calls and street-level imagery, to track and forecast homeless tent trends in San Francisco. Our predictive model captures fine-grained daily and neighborhood-level variations, uncovering patterns that traditional counts often overlook, such as rapid fluctuations during the COVID-19 pandemic and spatial shifts in tent locations over time. By providing more timely, localized, and cost-effective information, this approach serves as a valuable tool for guiding policy responses and evaluating interventions aimed at reducing unsheltered homelessness.

artificial intelligence, machine learning, spatial reasoning, (15 more...)

arXiv.org Artificial Intelligence

2508.06409

Country: North America > United States > California > San Francisco County > San Francisco (0.27)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.32)

Add feedback

Machine Learning for Modeling Wireless Radio Metrics with Crowdsourced Data and Local Environment Features

Qiu, Yifeng, Bose, Alexis

arXiv.org Artificial IntelligenceJan-2-2025

This paper presents a suite of machine learning models, CRC-ML-Radio Metrics, designed for modeling RSRP, RSRQ, and RSSI wireless radio metrics in 4G environments. These models utilize crowdsourced data with local environmental features to enhance prediction accuracy across both indoor at elevation and outdoor urban settings. They achieve RMSE performance of 9.76 to 11.69 dB for RSRP, 2.90 to 3.23 dB for RSRQ, and 9.50 to 10.36 dB for RSSI, evaluated on over 300,000 data points in the Toronto, Montreal, and Vancouver areas. These results demonstrate the robustness and adaptability of the models, supporting precise network planning and quality of service optimization in complex Canadian urban environments.

dataset, modeling, prediction, (10 more...)

arXiv.org Artificial Intelligence

2501.01344

Country:

North America > Canada > Quebec > Montreal (0.26)
North America > Canada > Ontario > Toronto (0.26)
North America > Canada > Ontario > National Capital Region > Ottawa (0.04)
Europe > Denmark (0.04)

Genre: Research Report (0.70)

Technology:

Information Technology > Communications > Social Media > Crowdsourcing (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

RoboCrowd: Scaling Robot Data Collection through Crowdsourcing

Mirchandani, Suvir, Yuan, David D., Burns, Kaylee, Islam, Md Sazzad, Zhao, Tony Z., Finn, Chelsea, Sadigh, Dorsa

arXiv.org Artificial IntelligenceNov-4-2024

In recent years, imitation learning from large-scale human demonstrations has emerged as a promising paradigm for training robot policies. However, the burden of collecting large quantities of human demonstrations is significant in terms of collection time and the need for access to expert operators. We introduce a new data collection paradigm, RoboCrowd, which distributes the workload by utilizing crowdsourcing principles and incentive design. RoboCrowd helps enable scalable data collection and facilitates more efficient learning of robot policies. We build RoboCrowd on top of ALOHA (Zhao et al. 2023) -- a bimanual platform that supports data collection via puppeteering -- to explore the design space for crowdsourcing in-person demonstrations in a public environment. We propose three classes of incentive mechanisms to appeal to users' varying sources of motivation for interacting with the system: material rewards, intrinsic interest, and social comparison. We instantiate these incentives through tasks that include physical rewards, engaging or challenging manipulations, as well as gamification elements such as a leaderboard. We conduct a large-scale, two-week field experiment in which the platform is situated in a university cafe. We observe significant engagement with the system -- over 200 individuals independently volunteered to provide a total of over 800 interaction episodes. Our findings validate the proposed incentives as mechanisms for shaping users' data quantity and quality. Further, we demonstrate that the crowdsourced data can serve as useful pre-training data for policies fine-tuned on expert demonstrations -- boosting performance up to 20% compared to when this data is not available. These results suggest the potential for RoboCrowd to reduce the burden of robot data collection by carefully implementing crowdsourcing and incentive design principles.

demonstration, interface, trajectory, (16 more...)

arXiv.org Artificial Intelligence

2411.01915

Country: North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre:

Research Report > New Finding (1.00)
Instructional Material > Course Syllabus & Notes (0.93)

Industry:

Leisure & Entertainment > Games > Computer Games (0.48)
Information Technology > Security & Privacy (0.34)

Technology:

Information Technology > Communications > Social Media > Crowdsourcing (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Mixture of Experts based Multi-task Supervise Learning from Crowds

Han, Tao, Shi, Huaixuan, Ding, Xinyi, Ma, Xiao, Gu, Huamao, Fang, Yili

arXiv.org Artificial IntelligenceJul-18-2024

Existing truth inference methods in crowdsourcing aim to map redundant labels and items to the ground truth. They treat the ground truth as hidden variables and use statistical or deep learning-based worker behavior models to infer the ground truth. However, worker behavior models that rely on ground truth hidden variables overlook workers' behavior at the item feature level, leading to imprecise characterizations and negatively impacting the quality of truth inference. This paper proposes a new paradigm of multi-task supervised learning from crowds, which eliminates the need for modeling of items's ground truth in worker behavior models. Within this paradigm, we propose a worker behavior model at the item feature level called Mixture of Experts based Multi-task Supervised Learning from Crowds (MMLC). Two truth inference strategies are proposed within MMLC. The first strategy, named MMLC-owf, utilizes clustering methods in the worker spectral space to identify the projection vector of the oracle worker. Subsequently, the labels generated based on this vector are considered as the inferred truth. The second strategy, called MMLC-df, employs the MMLC model to fill the crowdsourced data, which can enhance the effectiveness of existing truth inference methods. Experimental results demonstrate that MMLC-owf outperforms state-of-the-art methods and MMLC-df enhances the quality of existing truth inference methods.

dataset, ground truth, worker behavior model, (11 more...)

arXiv.org Artificial Intelligence

2407.13268

Country: Asia > China > Zhejiang Province > Hangzhou (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Crowdsourcing with Enhanced Data Quality Assurance: An Efficient Approach to Mitigate Resource Scarcity Challenges in Training Large Language Models for Healthcare

Barai, P., Leroy, G., Bisht, P., Rothman, J. M., Lee, S., Andrews, J., Rice, S. A., Ahmed, A.

arXiv.org Artificial IntelligenceMay-16-2024

Large Language Models (LLMs) have demonstrated immense potential in artificial intelligence across various domains, including healthcare. However, their efficacy is hindered by the need for high-quality labeled data, which is often expensive and time-consuming to create, particularly in low-resource domains like healthcare. To address these challenges, we propose a crowdsourcing (CS) framework enriched with quality control measures at the pre-, real-time-, and post-data gathering stages. Our study evaluated the effectiveness of enhancing data quality through its impact on LLMs (Bio-BERT) for predicting autism-related symptoms. The results show that real-time quality control improves data quality by 19 percent compared to pre-quality control. Fine-tuning Bio-BERT using crowdsourced data generally increased recall compared to the Bio-BERT baseline but lowered precision. Our findings highlighted the potential of crowdsourcing and quality control in resource-constrained environments and offered insights into optimizing healthcare LLMs for informed decision-making and improved patient care.

arxiv preprint arxiv, data quality, quality control, (13 more...)

arXiv.org Artificial Intelligence

2405.1303

Country:

North America > United States > California > San Diego County > San Diego (0.04)
North America > United States > Arizona > Pima County > Tucson (0.04)
North America > United States > Alaska (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine > Therapeutic Area > Neurology > Autism (0.50)

Technology:

Information Technology > Communications > Social Media > Crowdsourcing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Data Quality in Crowdsourcing and Spamming Behavior Detection

Ba, Yang, Mancenido, Michelle V., Chiou, Erin K., Pan, Rong

arXiv.org Artificial IntelligenceApr-3-2024

As crowdsourcing emerges as an efficient and cost-effective method for obtaining labels for machine learning datasets, it is important to assess the quality of crowd-provided data, so as to improve analysis performance and reduce biases in subsequent machine learning tasks. Given the lack of ground truth in most cases of crowdsourcing, we refer to data quality as annotators' consistency and credibility. Unlike the simple scenarios where Kappa coefficient and intraclass correlation coefficient usually can apply, online crowdsourcing requires dealing with more complex situations. We introduce a systematic method for evaluating data quality and detecting spamming threats via variance decomposition, and we classify spammers into three categories based on their different behavioral patterns. A spammer index is proposed to assess entire data consistency and two metrics are developed to measure crowd worker's credibility by utilizing the Markov chain and generalized random effects models. Furthermore, we showcase the practicality of our techniques and their advantages by applying them on a face verification task with both simulation and real-world data collected from two crowdsourcing platforms.

experiment, participant, spammer, (16 more...)

arXiv.org Artificial Intelligence

2404.17582

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > Texas > Travis County > Austin (0.04)
(5 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.92)

Industry:

Information Technology (0.67)
Health & Medicine (0.67)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Communications > Social Media > Crowdsourcing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Modeling Large-Scale Walking and Cycling Networks: A Machine Learning Approach Using Mobile Phone and Crowdsourced Data

Saberi, Meead, Lilasathapornkit, Tanapon

arXiv.org Artificial IntelligenceApr-2-2024

Walking and cycling are known to bring substantial health, environmental, and economic advantages. However, the development of evidence-based active transportation planning and policies has been impeded by significant data limitations, such as biases in crowdsourced data and representativeness issues of mobile phone data. In this study, we develop and apply a machine learning based modeling approach for estimating daily walking and cycling volumes across a large-scale regional network in New South Wales, Australia that includes 188,999 walking links and 114,885 cycling links. The modeling methodology leverages crowdsourced and mobile phone data as well as a range of other datasets on population, land use, topography, climate, etc. The study discusses the unique challenges and limitations related to all three aspects of model training, testing, and inference given the large geographical extent of the modeled networks and relative scarcity of observed walking and cycling count data. The study also proposes a new technique to identify model estimate outliers and to mitigate their impact. Overall, the study provides a valuable resource for transportation modelers, policymakers and urban planners seeking to enhance active transportation infrastructure planning and policies with advanced emerging data-driven modeling methodologies.

count data, study area, training and testing, (15 more...)

arXiv.org Artificial Intelligence

2404.00162

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
Oceania > Australia > New South Wales > Sydney (0.04)
(5 more...)

Genre:

Research Report > New Finding (0.66)
Research Report > Experimental Study (0.48)

Industry:

Health & Medicine (1.00)
Telecommunications (0.93)
Government > Regional Government > Oceania Government > Australia Government (0.69)
Transportation > Infrastructure & Services (0.68)

Technology:

Information Technology > Communications > Social Media > Crowdsourcing (0.83)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.46)

Add feedback

Near-real-time Earthquake-induced Fatality Estimation using Crowdsourced Data and Large-Language Models

Wang, Chenguang, Engler, Davis, Li, Xuechun, Hou, James, Wald, David J., Jaiswal, Kishor, Xu, Susu

arXiv.org Artificial IntelligenceDec-4-2023

When a damaging earthquake occurs, immediate information about casualties is critical for time-sensitive decision-making by emergency response and aid agencies in the first hours and days. Systems such as Prompt Assessment of Global Earthquakes for Response (PAGER) by the U.S. Geological Survey (USGS) were developed to provide a forecast within about 30 minutes of any significant earthquake globally. Traditional systems for estimating human loss in disasters often depend on manually collected early casualty reports from global media, a process that's labor-intensive and slow with notable time delays. Recently, some systems have employed keyword matching and topic modeling to extract relevant information from social media. However, these methods struggle with the complex semantics in multilingual texts and the challenge of interpreting ever-changing, often conflicting reports of death and injury numbers from various unverified sources on social media platforms. In this work, we introduce an end-to-end framework to significantly improve the timeliness and accuracy of global earthquake-induced human loss forecasting using multi-lingual, crowdsourced social media. Our framework integrates (1) a hierarchical casualty extraction model built upon large language models, prompt design, and few-shot learning to retrieve quantitative human loss claims from social media, (2) a physical constraint-aware, dynamic-truth discovery model that discovers the truthful human loss from massive noisy and potentially conflicting human loss claims, and (3) a Bayesian updating loss projection model that dynamically updates the final loss estimation using discovered truths. We test the framework in real-time on a series of global earthquake events in 2021 and 2022 and show that our framework streamlines casualty data retrieval, achieving speed and accuracy comparable to manual methods by USGS.

classifier, earthquake, information, (13 more...)

arXiv.org Artificial Intelligence

2312.03755

Country:

North America > Haiti (0.50)
Asia > China > Sichuan Province (0.14)
Asia > Philippines (0.05)
(12 more...)

Genre: Research Report > New Finding (0.93)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Information Technology (0.69)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback